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Abstract Although a variety of techniques are available today for gray-scale image compression, 
a complete evaluation of these techniques cannot be made as there is no single reliable objective 
criterion for measuring the error in compressed images. The traditional subjective criteria are 
burdensome, and usually inaccurate or inconsistent. On the other hand, being the most common 
objective criterion, the mean square error (MSE) does not have a good correlation with the 
viewers' response. It is now understood that in order to have a reliable quality measure, a 
representative model of the complex human visual system is required. In this paper, we survey 
and give a classification of the criteria for the evaluation of monochrome image quality. 


1. Introduction 

There is an ever increasing demand for transmission and storage of vast amounts of information m 
data processing environments today. To reduce the large costs involved, data compression is a 
widely accepted tool which aims at minimizing the amount of data to be stored or transmitted, 
variety of data compression techniques have been developed in the past few decades for different 
types of industrial, commercial, and educational applications. These techniques can be classified 
into two major categories: Lossless (exact) and lossy (inexact) [1, 2, 3], Lossless compression is 
concerned with reconstructing an exact replica of the original input data stream. It is essentially 
used in text compression where no loss can be tolerated. Disastrous results may be encountere 
for even a single bit of loss in, for example, program files or database records. The techniques in 
this category typically reduce text size 40 to 80%, while those developed for specific applications 
may achieve compression over 90%. Lossy data compression causes some amount of loss which 
is considered to be a concession for a drastic increase in compression. Lossy compression 
techniques are effective and appropriate primarily for digitized voice and images for two reasons. 
Firstly huge volumes of voice and images are normally generated in a typical application and, 
secondly, digital representation of analog signals is only an approximation, introducing a certain 

loss to begin with. 

Numerous image compression techniques [2-6] exist today with the common goal of reducing the 
number of bits needed to store or to transmit images. The efficiency of a compression algorithm is 
generally measured using three criteria: 


1) compression amount, 

2) implementation complexity, and 

3) resulting distortion. 


The amount of compression can readily be obtained using several definitions, among which there 
are compression ratio, figure of merit, and compression percentage. Algorithmic complexity, on 
the other hand, can be measured by considering the data structures as well as the type and number 


49 



of operations required. The difficulty in evaluating a lossy compression algorithm comes from the 
fact that there is no reliable and consistent measure for determining the magnitude of distortion 
resulting from the loss. In other words, we lack a useful and practical measure for image quality 
assessment! Such a measure is not only needed for comparing images produced by different 
techniques, but it is also instrumental in designing image processing/compression algorithms. 

In this paper, we survey the criteria available for the evaluation of monochrome image quality. In 
spite of the fact that some of the measures found in the literature have specifically been used for 
rating the performance of image processing systems, they are applicable in evaluating compression 
algorithms equally well. 


2. Image Quality Measures 


It is possible to classify image quality criteria as given in Figure 1. 
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Figure 1. Classification of Image Quality Criteria 


2. 1 Subjective Criteria 

As the final user of images are humans, the most reliable and commonly used assessment of image 
quality is the subjective rating by human observers. Both expert and nonexpert observers are used 
in experiments; nonexperts represent the average viewer while experts are believed to be able to 
give better, more 'refined' assessments of image quality since they have been trained and are 
familiar with images and their distortions. 

In absolute evaluation, the observers view an image and assess its quality by assigning to it a 
category in a given rating scale, whereas in comparative evaluation, a set of images are ranked 
from best to worst by the observers. The rating scales that appear in the relevant literature [5, 12, 
14, 15, 19] are listed in Table 1. 
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Table 1. Rating Scales Used in Subj ective Evaluation 


A. 

5. Excellent 
4. Good 
3. Fair 
2. Poor 

1. Unsatisfactory (bad) 


B. 

7. Best 

6. Well above average 
5. Slightly above average 
4. Average 

3. Slightly below average 
2. Well below average 
1. Worst 


Not noticeable (perceptible) 
Just noticeable (perceptible) 
Definitely noticeable 
(perceptible) but only slight 
impairment 

4. Impairment not objectionable 

5. Somewhat objectionable 

6. Definitely objectionable 

7. Extremely objectionable 


C. 

1 . 

2 . 

3. 


D. E. F. 


3 

Much better 

5. Imperceptible 

10,9 

Very good 

2 

Better 

4. Perceptible but annoying 

8,7 

Good 

1 

Slightly better 

3. Slightly annoying 

6, 5,4 

Fair 

0 

Same 

2. Annoying 

3,2 

Bad 

-1 

-2 

-3 

Slightly worse 
Worse 
Much worse 

1. Very annoying 

1,0 

Very bad 


The mean rating of a group of observers who join the evaluation is usually computed by 

I ” 

R = X s k n k 

\k=l 

where s k = the score corresponding to the kth rating, n k = the number of observers with this 
rating, and n = the number of grades in the scale. 

Bubble sort [5, 1 1, 22] is another technique used in image rating. With this technique, the subject 
compares two images A and B from a group and determines their order. Assuming that the order 
is AB, he/she takes a third image and compares it with B to establish the order ABC or ACB. If 
the order is ACB, then another comparison is made to determine the new order. The procedure 
continues until all the images have been used, allowing the best pictures to bubble to the top if no 
ties are accepted. 

It is important to note that the results of subjective rating are affected by a number of factors 
including 

a) type and range of images, 

b) level of expertise of the observers, and 

c) experimental conditions. 

If standards can be established for these factors, the results obtained in different locations and at 
different times may then become comparable. 



2.2 Quantitative Criteria 

Quantitative measures for image quality can be divided into two classes: Univariate and bivariate 
[19]. A univariate measure assigns to a single image a numerical value based upon measurements 
of the image field, and a bivariate measure is a numerical comparison between two images. 
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Fidelity measurements are usually made using an array of discrete image samples, although a 
continuous image field can also be generated by two-dimensional interpolation of the sample array 
if the overhead is justified. Image error measures can be defined in either spatial or frequency 
domain. 

Denoting the samples on the original image field as F(j,k), a spatial domain, univariate quality 
rating may be expressed in general as 

M N 

Q=ZZ 0{F(j,k)} 

j=l k=l 

for NxM samples, where O { • } is some operator. 


Bivariate measures are more frequently used in image quality measurement. If F(j,k) denotes the 
samples on the degraded image field, a number of measures can be established to determine the 
closeness of the two image fields. The alternatives are listed below [5, 9, 12, 19, 22-25]. 


(0 


I 


M N 


1/p 


L p = (1/MN) X X I F(j.k) - F(j,k)l I 


j=l k=l 


A major class of bivariate error measures is based on the Lp-norm. The factor p determines the 
relative significance of errors of different magnitudes. Li is the average absolute error and L 2 is 
the commonly used root mean square error (RMSE). As the value of p is increased, a greater 
relative emphasis is given to large errors in the image. 


(ii) Low order moment of a power spectrum. 


M N 

(iii) K=SE F(j,k) F(j,k) 

j=l k=l 

This measure is obtained by discretizing the continuous cross-correlation function. It may be 
normalized by the reference image energy to give unity as the peak correlation: 

M N 

X X F(j,k) F(j,k) 



M N 

X X [F(jJoi 

j=l k=l 


52 


(iv) Correlation quality: 


CQ = 


M N 

X X F(j,k) F(j,k) 

j = l k=l 

M N 

X X F (j,k) 

j=l k=l 


(v) Structural content: 


SC = 


M N 

X X [F(j,k)l 

j=l k=l 

M N 

X X [F(J,k)]' 

j=l k=l 


(vi) Normalized absolute error between the reference and degraded image fields: 


X X I °{F(J. k )}- °{F(j, k ))l 

NAE = J ~* k= - 

M N 

X X I 0[F(j.k)}| 

j=l k=l 


(vii) Normalized mean square error: 


M n ^ , 

X X [0{F(j,k)}- 0{F(j,k)}] 2 

— 

X S [ 0{F(j»}] 2 


(viii) Peak mean square error: 


PMSE = 


M N 


(1/MN) X X [0{F(j,k)}- 0{F(j,k)}] “ 

j=l k=l 


where A represents the maximum value of 0{F(jJc)}. 

The definitions used for the operator O { • } in (vii) and (viii) are 


(a) F(j,k) 

(b) [F(j,k)] v (Power law) 
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(c) ki logb [k 2 + k 3 F(jjc)] (Logarithmic) 

(d) [F(x,y) ® H(x,y)] 5 (x-jAx, y-kAy) (Convolution) 


(ix) Laplacian mean square error: 


M-l 

I 


LMSE = — 


X CO{F(j,k)}- 0{F(j,k)}] 2 

k=2 


M-l 

i 


j=i 


I [ 0{F(j Jc)}] 

k=2 


where 0{F(j,k)} = F(j+1, k) + F(j-1, k) + F(j,k+1) + F(j, k-1) - 4F(jJc) 


In many applications, the mean square error (however it is defined) is often expressed in terms of a 
signal-to-noise ratio defined in decibels. 


(x) Image fidelity: 


M N 


IF = 1 - 


X X FGJO - FGJOr 

j=l k=l 

“ M N 7 " 

X X fF(j,k)] 2 

j=l k=l 


(xi) Difference [j,k] = F(jjt) - F(j,k) 


M N 

(xy) X X Difference [j,k]/MN 

j=l k=l 

(xiii) Max { I Difference[j ,k] I } 

(xiv) Histogram of the compression error (constructed by plotting the number of x's versus x for 
all values of x found in the difference matrix). 

(xv) Hosaka plots 

(xvi) Sensitivity and predictive value positive curves 
(xvii) Rate-distortion curves. 

It is reported that image quality assessment can be improved by incorporating into the evaluation 
process some model of the HVS. The HVS is incorporated into the quality measure using two 
distinct approaches. In the first approach, the Lp norm (or one of its variants) is employed 
attaching a weight to the image samples either in the spatial or frequency domain. The second 
approach is concerned with weighting the digital image power spectrum. 
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In one of the earliest studies, the transformation 


0{ - } = H l (x,y) ® 0 N {•} 


is used on both the continuous image field F(x,y) and the degraded image field F(x,y) before 
applying the integral square error, where the impulse response HL(x,y) represents the lateral 
inhibition process, and the point nonlinearity ON{.}models the response of the eye's 
photoreceptors [11]. In the Fourier domain Hl is defined as 


a 



where 0) = ( 0)1 + 0 ) 2 ) 1/2 , and O { •} = [} I/3 is chosen. The experiments show that a = 2.6, 
c = 0.0192, (Bq = 1/0.1 14, ki = 1 and kj = 1.1 are the suitable parameter values. 

In another study [12] to find an objective measure which closely mirrors the performance of the 
human viewer, the error measure 




i/p 


i=l 


where m = number of picture elements (pels) in a picture, ej = xi - xj, x* = the value of the pel in 
the original picture and Xj = the value of the pel in the distorted picture, is tried for p = 1,2, 3, 4, 
6. The conclusion is that Ep is a very good estimate of impairment rating where the type of 
distortion is additive white noise. In the same study, another measure of picture impairment is 
obtained using 

,i/p 


EMr 




i=l 


to reflect the masking effect of the signal. Wj denotes the value of the weighting function at pel i 
and is derived from an activity function that is a measure of the variability of the signal in the 
neighborhood of pel i. Three different forms of activity functions are studied: 

Amax: measures the maximum signal change between any pair of pels in a neighborhood 
consisting of the pel being evaluated plus the eight surrounding pels. 

Aa V : sums the deviations of the same neighborhood of points from the neighborhood average x 

Adp provides the weighted sum of the magnitude of the surrounding element difference (slope) 
in both the horizontal and vertical directions. 


In all three cases Wi is obtained from Ai so as to span a range from 1.0 to 10.0. There is also an 
attempt in [12] to obtain a local measure of image quality. Relying on the postulate that the viewer 
rates the image by some weighted average of the worst two or three patches, Limb divides the 
image into a rectangular array of squares and calculates a local measure for each square with and 
without masking. He also tries the formula 
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1/p 


EMp = |j r £kVW i l p j 

in his local error analysis. The quantitative model that Limb uses for the human viewer includes 
some error filtering as well. Comparison of the simple RMSE as a measure of image quality with 
the best error measure predictions of the model shows that RMSE performs surprisingly well. 
This results, Limb explains, from the fact that in most distorted images, quality is determined 
mainly by the visibility of distortion in flat areas where it is more visible and consequently the 
effects of masking have little effect For images where distortion is greater at edges, however, the 
RMSE is claimed to be less satisfactory. 

The results of a subjective evaluation on twelve versions of a black and white image and the rank 
ordering obtained with three computational measures are presented by Hall [22]. He compares the 
performance of the measures NMSE, LMSE, and PMSE, which are defined for an NxN discrete 
image as 

N N 

X X - f(m,n)] 

NMSE = 1?=1 n=1 

N N 

X X Cf(m,n)] 

m=l n=l 

N-l N-l ^ 

X X t G ( m ’ n ) - G(m,n)] 

LMSE = m=2 n=2 

N-l N-l 

X X [G(m,n)] 2 

m=2 n=2 

where G(m,n) = f(m+l,n) + f(n-l,n) + f(m,n+l) + f(m,n-l) - 4f(m,n) 

N N 

X X [ z (m,n) * z(m,n)] 

PMSE = ^ =1 n=1 , 

N N 

X X [z ( m » n )] 2 

m=l n=l 

where z(m,n) and z(m,n) are given by 

z(m,n) = ln[f(m,n)] ® hb P (m,n) 
and 

z(m,n) = ln[f(m,n)] ® h bp (m,n) 

The function hbp(m,n) is a rectangular coordinate form of the point spread function of the HVS. 

In his comparison, Hall finds that the correlation between PMSE and the subjective ranking 
(obtained by using bubble sort) of the data set is higher than that of NMSE and LMSE. 
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Nill [8] arrives at a quality measure in the 2-D discrete Fourier spatial frequency domain. This 
measure is expressed as 


B M-l N-l 

K 1 X w i XI H 2 (r) [Fi(u,v) - Fi(u,v)] 2 , 

i=l u=0 v=0 

where B = number of subimage blocks in scene, 

K = normalization factor such as total energy, 

H(r) = rotationally symmetric spatial frequency response of HVS, r = V u 2 + v 2 , 

Fj, Fi = Fourier transform of unprocessed and processed subimage i, respectively, 

M,N = number of Fourier coefficients + 1, in orthogonal u, v directions, 

Wi = subimage i structure weighting factor, proportional to subimage's intensity level 
variance. 


Using H(r) = (0.2 + 0.45r)e' 018r , he then constructs the function 

OosfO* 54 for r< 7 

I A(r) I H(r) = { 

e -9 [I log 10 r-log 10 9 1] 2 - 3 for r> 7 

for dealing with image cosine transforms instead of image Fourier transforms. Finally, he argues 
that (i) combining the HVS model with the image cosine transform will result in better performance 
in image compression and image quality assessment applications, and (ii) performance in quality 
assessment should also be enhanced by inclusion of the subimage structure weighting. 

Marmolin [9] addresses the question of using the mean squared error (MSE) measure as a quality 
criterion in image processing, and evaluates the predictive power of 


E = 


illD,,' 


-,1/p 


L i=l 
Dj = ai - g (Xi - yi) 


where g = some processing function that determines the visibility of the error, ai = a weight related 
to the informative value of pixel i, and p = a factor that determines the relative importance of small 
and large errors, x, = the gray level of pixel i in the original image, yi = the gray level of pixel i in 
the processed image. He investigates the performance of different definitions for Dj, and compares 
them to that of the mean squared error 


MSE = 





The results obtained indicate that MSE is an unsatisfactory measure of perceived similarity, and 
that no measure is valid for each image set used. 
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Saghri, Cheatham, and Habibi [10] state that once an image U(x,y) and its reproduction have been 
subjected to the HVS model, then the mean square error 


d(U,lT) = [U(x,y) - U'(x,y)]dxd yj 


where N is the image area or the number of pixels, may be considered as a meaningful measure of 
image quality. Adopting the approach of Mannos and Sakrison, they use in their HVS model 

f(u) = u °-33 

where u is the pixel intensity, and 


A(f r ) = 


0.2 + 0.81 



I 2 2l l/2 

where f r = \f x + fyj . The corrections (developed by Nill) 



1 

/ 2 \ 1/2 ‘ 

\1 

C(f r ) = 

4 +J n loe ‘ 

n 

2Ilf r + 4ELf r 2 + l 

Lava / J 



to the HVS model of A(f r ) is then added to give the DCT version 

ADCT = A(f r )C(f r ). 

As an alternative to the MSE, the authors propose the so-called information content (IC). The IC 
of an image for a given resolution is defined as the sum of the magnitudes of its DCT spectral 
components after they have been appropriately normalized based on HVS sensitivity models for 
that particular resolution. The plot of IC versus the resolution provides some insight into the 
quality of a given image. The preliminary results are reportedly promising, but much more 
experimentation is needed to adjust the numerous parameters of the system for highest achievable 
correlation with the subjective measure. 

The work by Ngan, Leong, and Singh [16] describes an adaptive cosine transform coding scheme 
for color images. The cosine transform coefficients are weighted by the HVS function given by 
Nill to generate the coefficients in perceptual domain. To determine the parameters of the HVS 
filter 


H(w) = (a+bco) exp (-cco) 


plots of SNR versus peak frequency are used. The SNR is defined by 


SNR = -lOlog id 


511511 

— i — y y 

(512) 2 jto k=o 


[f(j,k) - fq jc)] 2 
(255 ) 2 


where f(jJt) and f(j,k) are the original and reconstructed pixels, respectively. Their results show 
that the subjective quality of the reconstructed images at a bit rate of 0.4 bit/pixel or a compression 
ratio of 60: 1 is very good. 


58 


Khafizov, Fisher, and Kiselyov [18] propose a new approach to simulate human visual perception 
in order to devise a tool for measuring distance between images. Defining the error matrix by 

E = X-Y, 

where X and Y are the two images to be compared, they renormalize each error in E with respect to 
other errors. Renormalization is the core of their method and it produces a new re-estimated error 
matrix E'. Once E' is obtained, they compute the p-norm of E’ as the distance between X and Y. 
In the case when there are only two errors e and z in E, the formula 

z, ez>0 

e'(z) = (e+z), where z = ( 


where a = some positive constant and s = distance between e and z, is used for re-estimating the 
error e with respect to error z. The generalization to an arbitrary case is immediate. The 
experiments presented demonstrate the inconsistency of the conventional RMSE together with the 
success in simulating visual human perception. 

Nill and Bouzas [17] present an objective, quantitative image quality measure based on the digital 
image power spectrum of normally acquired arbitrary scenes. Using polar coordinates p, 0 the 
image quality measure is derived from the normalized 2-D power spectrum P(p, 0) weighted by the 
square of the modulation transfer function of the human visual system A 2 (Tp), the directional scale 
of the input image S(0i), and the modified Wiener noise filter W(p): 

180 0.5 

IQM = J- X X S(©i)W(p)A 2 (Tp)P(p, 0), 

M 2 0=-18O P=0.01 

where M 2 = number of pixels. In its application, a previously constructed modulation transfer 
function [8] is used for the HVS. The authors point out that the power spectrum approach does 
not require use of designed quality assessment targets or reimaging the same scene for comparison 
purposes. Experimental verification indicates good correlation of this objective quality measure 
with visual quality assessments. 


3. Conclusions 


Traditionally, the most reliable way of measuring image quality has been the subjective evaluation 
by human observers. Because of the inherent difficulties associated with this approach, much 
attention has been focused on the development of quantitative techniques for quick and objective 
measurement. The image quality measure that has been commonly used in digital image 
compression is the mean square error (MSE) between the original image and the reconstructed 
image. It is now a well-known fact , however, that the MSE and its variants do not correlate 
reasonably well with subjective quality measures [4, 5, 7-10, 21]. A major portion of recent 
research is, therefore, directed towards incorporating human visual system (HVS) models into 
image quality measures. This is not a trivial task because the human visual system is too complex 
and an accurate model cannot presently be developed. Nevertheless, a number of experiments with 
simplified models indicates that the inclusion of a model for the HVS generally produces results 
that are in better correlation with the perceived image quality [4, 7, 8, 10- 18, 22]. pie trial models 
take into consideration various recognized characteristics of the HVS, and usually have both linear 
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and nonlinear parts. As we have a better understanding of the psychophysical phenomena 
concerning the human vision, we will be able to develop more accurate models which, in turn, will 
lead to results closer to the human response. 
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